Distance Transformation for Effective Dimension Reduction of High-Dimensional Data
نویسندگان
چکیده
In this paper we address the problem of high-dimensionality for data that lies on complex manifolds. In high-dimensional spaces, distances between the nearest and farthest neighbour tend to become equal. This behaviour hardens data analysis, such as clustering. We show that distance transformation can be used in an effective way to obtain an embedding space of lower-dimensionality than the original space and that increases the quality of data analysis. The new method, called HighDimensional Multimodal Embedding (HDME) is compared with known state-of-the-art methods operating in high-dimensional spaces and shown to be effective both in terms of retrieval and clustering on real world data.
منابع مشابه
Distance Preserving Dimension Reduction for Manifold Learning
Manifold learning is an effective methodology for extracting nonlinear structures from high-dimensional data with many applications in image analysis, computer vision, text data analysis and bioinformatics. The focus of this paper is on developing algorithms for reducing the computational complexity of manifold learning algorithms, in particular, we consider the case when the number of features...
متن کاملUnsupervised Dimension Reduction of High-Dimensional Data for Cluster Preservation
High-dimensional data is receiving increasing attention in more and more application fields, but the analysis of such data has shown to be difficult due to the “curse of dimensionality”. Dimension reduction methods have emerged as successful tools to overcome the problem of high-dimensionality. However, even if they are designed to preserve the most important properties of the data, they are ge...
متن کاملDimension Reduction for Linear Separation with Curvilinear Distances
Any high dimensional data in its original raw form may contain obviously classifiable clusters which are difficult to identify given the high-dimension representation. In reducing the dimensions it may be possible to perform a simple classification technique to extract this cluster information whilst retaining the overall topology of the data set. The supervised method presented here takes a hi...
متن کاملHigh-Throughput Multi-dimensional Scaling (HiT-MDS): New Variant of MDS
Visualization is a useful tool for data analysis, especially when the data is unknown. However, when the dimension is huge, to produce robust visualization is difficult. Therefore, the dimensional reduction technique is needed. Multi-dimensional Scaling (MDS) is one of the best technique to do dimension reduction, and in this paper one of its variant, that is focused on High-throughput data, ca...
متن کاملA Monte Carlo-Based Search Strategy for Dimensionality Reduction in Performance Tuning Parameters
Redundant and irrelevant features in high dimensional data increase the complexity in underlying mathematical models. It is necessary to conduct pre-processing steps that search for the most relevant features in order to reduce the dimensionality of the data. This study made use of a meta-heuristic search approach which uses lightweight random simulations to balance between the exploitation of ...
متن کامل